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Abstract 

In this paper, I present the design and implementation 
of Clown — a simulator of a microprocessor-based com- 
puter system specifically optimized for teaching operat- 
ing system courses at undergraduate or graduate levels. 
The package includes the simulator itself, as well as a col- 
lection of basic I/O devices, an assembler, a linker, and a 
disk formatter. The simulator architecturally resembles 
mainstream microprocessors from the Intel 80386 family, 
but is much easier to learn and program. The simulator 
is fast enough to be used as an emulator — in the direct 
user interaction mode. 

A NEED FOR A SIMULATOR 

An important part of the agenda of a college-level 
operating system course is to examine the interaction be- 
tween an operating system and computer hardware. As- 
sembly programming teaches students to think logically, 
waste no byte and no CPU cycle. Knowing the hard- 
ware helps students to understand the operation of such 
foundational mechanisms as memory protection, process 
dispatching, input/output, and file system organization. 
It also makes it clearer the motivations behind certain 
OS design decisions. Last, but not least, from the prac- 
tical point of view, exposing students to low-level pro- 
gramming prepares them for potential projects involving 
embedded systems and hand-held devices. 

Elements of low-level assembly programming can be 
also found in computer architecture courses. Many uni- 
versities 1 continue to offer general assembly program- 
ming courses where students learn how to extract the 
ultimate performance from the computer hardware. 

Traditionally, colleges have been using various RISC 
architectures (such as MIPS or RS6000) or Motorola 68x 
family as their primary hardware platforms. RISC cores 
are reasonably simple and regular. However, this trend 



1 Suffolk University being one of them. 



seems to be rapidly disappearing in favor of the indus- 
trial mainstream Intel32 architecture. It should be also 
noted that from the OS development point of view, RISC 
cores lack many important features, such as segmen- 
tation (for superior memory protection) and non-trap- 
based system call support. 

On the other hand, Intel32 CISC architecture is hard 
to learn. The instruction set is redundant, and the in- 
struction format is highly irregular. This makes Intcl32 
system programming challenging, especially for under- 
graduate students. A need clearly exists for a good 
microprocessor simulator that could be used in an OS 
course (and possibly in other related courses). 

EXISTING SIMULATORS 

Many microprocessor simulators have been devel- 
oped, but most of them do not address the topic from 
the OS study point of view. 

Some of them simulate RISC or otherwise "inap- 
propriate" targets (e.g., Ant-32 [4], MicSim [6], Micro- 
processor Trainer Simulator [2], and various Intel 8085 
simulators, such as described in [11]). 

Other simulators are too detailed (such as 
VMware [12] and SID [8]). They are simulating the com- 
puter hardware as close as possible, thus defeating the 
whole purpose of using a simulator in an undergraduate- 
level class. On the other hand, many simulators designed 
for educational purposes, are oversimplified (MSFB [1], 
also [5] and [2]). Being good for an introductory com- 
puter hardware course, they fail to provide substantial 
mechanisms for building advanced operating systems. 

To summarize, existing simulators are either opti- 
mized to be used in industry or in a hardware-oriented 
course [7] , but not in a "classic" OS course [9] , or they are 
intentionally hiding hardware from the upper OS layers. 

A wish list for an OS-optimized simulator includes 
the following requirements: 

• Rich support for OS concepts. 

• Little or no support for application-specific features, 
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Figure 1. Clown system architecture 



such as string operations and floating-point unit (to 
reduce complexity and learning time). 

• Reasonably detailed simulation (to make sure that 
the simulator could also be used in a computer ar- 
chitecture course). 

• A collection of basic I/O devices, with a mechanism 
for adding more devices, if needed. 

• Fast (preferably real-time) execution, ideally in real 
time emulation mode. 

• A simple interface. 

• A substantial set of development tools (such as as- 
sembler, linker, disk editor, debugger, C compiler). 

To satisfy these requirements, I developed Clown 
— a new simulator of an Intel-style microprocessor and 
computer system specifically tuned to the needs of the 
courses mentioned above. 

CLOWN OVERVIEW 

Clown simulator suite is partially based on the Sim- 
ple Hard Disk Emulator (SHaDE [13]) written at Suffolk 
University as a simple vehicle for teaching the low-level 
organization of file systems. 

The system architecture of Clown is shown in Fig- 
ure 1. 



The simulator consists of a Clown CPU with a 
single-level direct-mapped write-back cache (included to 
simulate DMA transfers accurately), one bank of 32-bit 
non-interleaved memory, 32-bit system and I/O buses 
with an implied bridge (the bridge is not simulated, and 
both buses are treated as one bus), 256 I/O ports (Intel 
has 65,536 ports), 16 interrupt channels, and one DMA 
channel (Intel typically has 7 DMA channels) . Four basic 
I/O devices are included in the standard configuration. 

The architecture of the Clown CPU is shown in Fig- 
ure 2. 

The CPU has sixteen 32-bit general-purpose regis- 
ters (Intel: 8 GPR and two control registers, CRO and 
CR3), eight 32-bit segment registers 2 (Intel: 6 segment 
and 4 memory management registers), one 16-bit flag 
register, an instruction register and a program counter. 
A 16-entry direct-mapped Translation Look-aside Buffer 
contributes to the accuracy of DMA transfers (Intel 
80386 has a 32-entry 4-way set-associative TLB). There 
is no dedicated stack pointer register, page table base 
register, and page fault address register. Their functions 
are assigned to general-purpose registers %R13 throught 
%R15. 

Clown supports only one data type: signed 32-bit 
word (for comparison, Intel supports at least 12 data 



This number is redundant and can be reduced to six. 



100 



San Jose, July 2004 



Proceedings of Summer Computer Simulation Conference 



General Purpose Registers 


Segment registers 




! %R0 




%R8 I 


! %ISR 


II 


%SS 


j %R1 




%R9 |[ 


| %GDT 


II 


%DS 


J %R2 




%R10 |[ 


I %LDT 


II 

■ I 


%ES 




) %R3 




%R1 1 | 




' %R4 




%R12 | 


i %CS 




( %R5 




%R13(%SP) | 










j %R6 




%R14 (%PAGE) | 


TLB (16 entries) 






j %R7 




%R15(%FAR) | 


Linear Address 


Physical Address 


W 


P 


Flags 










| CPL | IOPL 


II 


i |o|s|z|c| 











i 5 ii s i i i ~m 

Figure 2. Clown CPU architecture; shaded registers 
and flags are not accessible from programs 

types [10]). This feature drastically simplify system pro- 
gramming. On the other hand, it poses interesting chal- 
lenges to compiler developers (such as type representa- 
tions and conversions, and implementation of floating 
point arithmetics). 

The Clown CPU supports both paging and segmen- 
tation. Either memory organization mechanism can be 
turned off (by disabling the page table or by declaring 
all memory to be one big implicit segment). 

In the Intel architecture, an interrupt vector (IV) 
is always treated as an array of segment descriptors, 
each identifying an entry into an interrupt service rou- 
tine (ISR). In pure paging mode, it would be highly de- 
sirable to have no segments whatsoever, including the 
ISRs. This is accomplished by forcing all ISRs to be 
8-word aligned, and treating the least significant bit of 
an IV entry as a mode bit. When this bit is clear, the 
IV entry is treated as a segment descriptor. If the bit 
is set, the entry is treated as the direct address of the 
entry point followed by two protection bits. While this 
approach does not seem elegant enough, it nevertheless 
allows the development of segment-free operating sys- 
tems. 

Compared to i386, Clown has significantly fewer in- 
structions, which reduces the learning time (Table 1). 
Explicitly omitted are data conversion instructions, dec- 
imal arithmetics, address manipulation, string, and 
translation instructions, and high-level language support 
instructions. 

A Clown instructions consist of cither one or two 
words. The second word, if present (recognized by the 
MSB of the first word, or by the "x" prefix in the 
mnemonics), is always the immediate operand. 

The number of flags has also been minimized. There 
are only 7 externally visible flags: Carry, Zero, Sign, 
Overflow, Interrupts (enabled), and two I/O Privilege 



Level flags (compared to 13 flags in Intel 30386). 

Clown runs a simple fetch-decode-execute loop. The 
execution of each instruction takes exactly one Clown cy- 
cle. External interrupts are reported and queued at the 
end of a cycle. Nested interrupts are permitted, with 
high-precedence interrupts preempting low-precedence 
interrupts. This simulation model may change in the 
future to better reflect modern pipelined architectures 
and their impact on process context switches. 

PERIPHERAL DEVICES 

Currently, Clown has four peripheral devices: in- 
terval timer, terminal, hard disk controller, and direct 
memory access (DMA) controller. Each device has a 
configurable I/O base and a configurable IRQ channel. 
All devices can operate in both polling and interrupt 
modes. 

The interval timer works both in interval and single- 
shot modes. It generates an interrupt upon expiration, 
and can be stopped at any time. The following assembly 
code fragment programs the timer to expire (once) in 
1000 cycles: 

#include "config.h" 
; reset timer 
out 1, (I0BASE_TIMER + 0) 
; set the counter and trigger timer 
out 1000, (I0BASE_TIMER + 0) 
; wait for an interrupt 
hit 

The terminal combines a keyboard and a sequen- 
tially accessible (not memory-mapped) display. When 
in the interrupt mode, it generates interrupts on 
keystrokes. The terminal does not echo characters (echo- 
ing is left to the programmer). 

The hard disk controller carefully simulates the me- 
chanical behavior of a relatively simple hard disk (includ- 
ing track-to-track and maximum seek latency, and rota- 
tional latency) . The dynamic parameters of the disk are 
run-time configurable. Inter-sector gaps make it possible 
to optimize file systems for high-speed streaming opera- 
tions. When in the interrupt mode, the controller gen- 
erates interrupts on completion of seek, read, and write 
operations. A one-block read-write buffer is prchistori- 
cally tiny, but yet sufficient to study the foundations of 
disk I/O subsystem. At most one I/O request can be 
pending at any time, so no disk scheduling is provided. 

The DMA controller is the most intelligent periph- 
eral device. Clown carefully simulates DMA transfers; 
data are transferred only when the bus is not used by 
the main CPU. A transfer unit is fixed and equal to 
one disk block (one virtual memory page). When in the 
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Table 1. Comparison of the Clown and i386 instruction sets 



Group 


i386 


Clown 


Group 


i386 


Clown 


Data movement 


8 


13 


Arithmetic 


12 


12 


Shift / Rotate 


12 


8 


Logical 


6 


11 


Bits and Bytes 


39 


8 


Flag control 


11 


6 


Processor control 


4 


3 


Flow control 


77 


18 


Memory protection 


1 


5 


I/O 


4 


3 


Other 


25 











Total 


200 


87 



interrupt mode, the controller generates an interrupt on 
completion of the transfer (which happens after the com- 
pletion of the respective disk read operation or before the 
completion of the respective disk write operation). 

The DMA controller works concurrently with the 
rest of the Clown system. Its implementation as a part of 
the main fetch-decode-execute loop would involve com- 
plex serialization and synchronization issues. For in- 
stance, one Clown out instruction triggers a disk-to- 
memory transfer which takes a significant and uncertain 
number of instructions to complete (due to seek and ro- 
tational latencies). Calling a read-sector function is not 
an option, because it would hinder the main loop. 

As a result, the controller is implemented using 
/iCVM (Microcontroller Virtual Machine) to enable true 
concurrent execution of the main simulator and the sim- 
ulator of the controller and also to make the controller 
potentially reconfigurablc. /iCVM is a "micro-Clown": 
it has 8 general-purpose registers, a one-bit flag register, 
and a program counter. The instruction set consists of 
10 commands (see Table 2). 

The main loop of the simulator first executes the 
next Clown instruction. If it is not a memory reference 
or an I/O instruction, then the next /iCVM instruction 
is executed. Otherwise, the next yuCVM instruction is 
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Figure 3. /iCVM system architecture 
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executed only if it is not a memory reference or an I/O 
instruction. 

The main program of the /iCVM that controls both 
disk-to-memory and memory-to-disk transfers, fits in 
just 132 bytes of the controller memory. 

The code simulating the peripheral devices is orga- 
nized as dynamically loaded libraries (one device per li- 
brary). This organization turned out to be flawed: while 
it did not significantly contribute to the reconfigurabil- 
ity of the simulator, it undermined its integrity. In the 
future releases, all code pieces will be linked together. 

ASSEMBLY LANGUAGE 

Clown assembly (cas) language uses a mixture of 
"Intel-style" and "MlPS-style" syntax. It has 53 com- 
mands and 90 modifications. The language allows deci- 
mal, octal, and hex numbers (in prefix and postfix nota- 
tion), and ASCII characters and strings. Because Clown 
does not have a byte data type, characters and strings 
are translated into words and arrays of words, one char- 
acter per word. This cumbersome conversion leads to 
"sparse" strings and poor memory utilization, but sig- 
nificantly reduces the number of machine instructions. 

Before assembling, the source code is run through a 
standard C preprocessor (cpp). 

The cas assembler supports multiple segments (if 
needed) and global symbols. It can produce raw "bin" 
executable files, without headers and symbol tables, and 
structured multi-segment "exe" files, with symbol tables 
and provisions for further linking with other files of the 
same kind, using the Clown linker (clink), "exe" files are 
fully relocatable. The displacement of the entry point 
into a "bin" file can be specified at the assembly time. 
"Bin" files can be used as directly loadable ROM/RAM 
images. Clown simulator can simultaneously load sev- 
eral executable images (for instance, to simulate several 
processes without writing an OS loader). 

So far, Clown does not include a run-time loader. 
Loaders are heavily OS-dependent, and should be writ- 
ten by OS developers. 
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Table 2. fiCVM Instruction Set 



Opcode 


Instruction 


Description 


Opcode 


Instruction 


Description 




Singl 


e-word instructions 


Double-word instructions 


Arithmetic and control instructions (AC) 


Oh 


NOP 






Do nothing 


4h 


xMOVI reg 


val 


Store a constant 


lh 


JEQ 


dest 




Conditional jump 


5h 


xADDI reg 


val 


Add a constant 


2h 


JMP 


dest 




Unconditional jump 


6h 


xCMPI reg 


val 


Compare 


3h 


END 






Stop the VM 


7h 






reserved 


I/O and memory instructions (IOM) 


8h 


OUT 


port 


reg 


Output to the port 


Ch 


xOUTI port 


val 


Output a constant 


9h 


IN 


port 


reg 


Input from the port 


Dh 






reserved 


Ah 


ST 


[reg] 


reg 


Store indirectly 


Eh 






reserved 


Bh 


LD 


[reg] 


reg 


Load indirectly 


Fh 






reserved 



FEASIBILITY AND PERFORMANCE 
EVALUATION 

The Clown architecture is meant to be feasible in 
the sense that, whether a need arises to implement it in 
either FPGA or directly in hardware, it will not pose sig- 
nificant risks and challenges. This estimation is based on 
the author's experience with the FLUX superconductor 
microprocessor design [3]. 

In the experiments, the Clown system simulated 4 
million instructions per second (4 MIPS) on a 1.3 GHz 
Pentium host CPU (native performance 2600 MIPS). 
The following code was used for performance evaluation: 

mov 7,rl, 10000000 
again: dec %rl 

jnz again 
stop 

This performance is roughly equivalent to Intel 8086 
(4.77 MHz), which is reasonably good for real-time user 
interaction. 



• boot-dma — load the first sector of the first track, 
using DMA, and execute its contents; 15 NLOCs 

• int -timer — populate and test the interrupt vector 
(timer ISR); 60 NLOCs 

• int-kbd — populate and test the interrupt vec- 
tor (timer ISR and keyboard ISR, competing for 
a counter variable); 85 NLOCs 

• page-table — populate and test the page table; 15 
NLOCs 

• page-fault — populate and test the interrupt vec- 
tor (page fault handler); 25 NLOCs 

• file — traverse a disk file organized as a linked list; 
30 NLOCs 

Out of 14 students taking the class, nine success- 
fully completed all assignments, three completed 7 as- 
signments, and the remaining two completed 5 assign- 
ments. 



CLOWN IN A CLASSROOM 

The Clown system was "field tested" in an under- 
graduate Operating Systems course during the Spring 
2004 semester. The following eight assignments were 
given throughout the semester to the students who had 
taken an introductory assembly language course based 
on Intel 8086 architecture. For each assignment, the 
typical length of the program code in non-commented 
lines of code (NLOC) is given. 

• kputs — display a character string, using the ter- 
minal; 60 NLOCs 

• boot — load the first sector of the first track, using 
polling, and execute its contents; 25 NLOCs 



CONCLUSION AND FUTURE WORK 

Clown system is a powerful, simple, fast, config- 
urable, and extensible microprocessor simulator which 
can be used in various college-level courses, especially in 
those dealing with operating systems. 

Future work includes developing a debugger, a col- 
lection of sample run-time loaders for multi-segment ex- 
ecutable files, a C compiler, and a graphical user inter- 
face. The simulator may be further optimized for speed. 
Pipelining support needs to be added to provide more 
realistic simulation if the package is to be used in a com- 
puter architecture class. Networking, mouse, and graph- 
ics mode support would enable many other uses of the 
simulator (such as a vehicle in a PDA graphics study). 
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